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A system and method for improving 
speech recognition accuracy in the pres- 
ence of noise is provided. The speech 
recognition training unit is modified to 
store digitized speech samples into a speech 
database that can be accessed at recog- 
nition time. The improved recognition 
unit comprises a noise analysis, modelling, 
and synthesis unit which continually ana- 
lyzes the noise characteristics present in the 
audio environment and produces an esti- 
mated noise signal with similar character- 
istics. The recognition unit then constructs 
a noise-compensated template database by 
adding the estimated noise signal to each of 
the speech samples in the speech database 
and performing parameter determination on 
the resulting sums. This procedure accounts 
for the presence of noise in the recogni- 
tion phase by retraining all the templates 
using an estimated noise signal with simi- 
lar characteristics as the actual noise signal 
that corrupted the word to be recognized. 
This method improves the likelihood of a 
good template match, which increases the 
recognition accuracy. 



Speech 
x(n)" 



•10 







Template 
Database 


^-14 


i. 




Template 
{t(n)>^ 




16 






Parameter 


Pattern 


Distance 


Decision 




Determination 




Comparison 




Block 




Speech Recognition Unit 





■ Decision 



: <WO 9940571 A1J_> 




AL 

AM 

AT 

AU 

AZ 

BA 

BB 

BE 

BF 

BG 

BJ 

BR 

BY 

CA 

CF 

CG 

CH 

CI 

CM 

CN 

cu 
cz 

DE 
DK 
EE 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



Albania 

Armenia 

Austria 

Australia 

Azerbaijan 

Bosnia and Herzegovina 

Barbados 

Belgium 

Burkina Faso 

Bulgaria 

Benin 

Brazil 

Belarus 

Canada 

Central African Republic 

Congo 

Switzerland 

Cdte d'lvoire 

Cameroon 

China 

Cuba 

Czech Republic 
Germany 
Denmark 
Estonia 



ES 


Spain 


LS 


Lesotho 


SI 


FI 


Finland 


LT 


Lithuania 


SK 


FR 


France 


LU 


Luxembourg 


SN 


GA 


Gabon 


LV 


Latvia 


sz 


GB 


United Kingdom 


MC 


Monaco 


TD 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


CH 


Ghana 


MG 


Madagascar 


TJ 


CN 


Guinea 


MK 


The former Yugoslav 


TM 


GR 


Greece 




Republic of Macedonia 


TR 


HU 


Hungary 


ML 


Mali 


TT 


IE 


Ireland 


MN 


Mongolia 


UA 


IL 


Israel 


MR 


Mauritania 


UG 


IS 


Iceland 


MW 


Malawi 


US 


IT 


Italy 


MX 


Mex ico 


uz 


JP 


Japan 


NE 


Niger 


VN 


KE 


Kenya 


NL 


Netherlands 


YU 


KG 


Kyrgyzstan 


NO 


Norway 


zw 


KP 


Democratic People's 


NZ 


New Zealand 






Republic of Korea 


PL 


Poland 




KR 


Republic of Korea 


PT 


Portugal 




KZ 


Kazakstan 


RO 


Romania 




LC 


Saint Lucia 


RU 


Russian Federation 




LI 


Liechtenstein 


SD 


Sudan 




LK 


Sri Lanka 


SE 


Sweden 




LR 


Liberia 


SG 


Singapore 





Slovenia 

Slovakia 

Senegal 

Swaziland 

Chad 

Togo 

Tajikistan 

Turkmenistan 

Turkey 

Trinidad and Tobago 

Ukraine 

Uganda 

United States of America 

Uzbekistan 

Viet Nam 

Yugoslavia 

Zimbabwe 



DOCID: <WO 9940571 A 1_ 



WO 99/40571 



PCT/US99/02280 



1 

SYSTEM AND METHOD FOR NOISE-COMPENSATED 
SPEECH RECOGNITION 

BACKGROUND OF THE INVENTION 

5 

I. Field of the Invention 

The present invention relates to speech processing. More particularly, 
the present invention relates to a system and method for the automatic 
10 recognition of spoken words or phrases. 

II. Description of the Related Art 

Digital processing of speech signals has found widespread use, 

15 particularly in cellular telephone and TCS applications. One digital speech 
processing technique is that of speech recognition. The use of speech 
recognition is gaining importance due to safety reasons. For example, 
speech recognition may be used to replace the manual task of pushing 
buttons on a cellular phone keypad. This is especially important when a 

20 user is initiating a telephone call while driving a car. When using a phone 
without speech recognition, the driver must remove one hand from the 
steering wheel and look at the phone keypad while pushing the buttons to 
dial the call. These acts increase the likelihood of a car accident. Speech 
recognition allows the driver to place telephone calls while continuously 

25 watching the road and maintaining both hands on the steering wheel. 
Handsfree carkits containing speech recognition will likely be a legislated 
requirement in future systems for safety reasons. 

Speaker-dependent speech recognition, the most common type in use 
today, operates in two phases: a training phase and a recognition phase. In 

30 the training phase, the speech recognition system prompts the user to speak 
each of the words in the vocabulary once or twice so it can learn the 
characteristics of the user's speech for these particular words or phrases. 
The recognition vocabulary sizes are typically small (less than 50 words) and 
the speech recognition system will only achieve high recognition accuracy 

35 on the user that trained it. An example of a vocabulary for a handsfree 
carkit system would include the digits on the keypad, the keywords "call", 
"send", "dial", "cancel", "clear", "add", "delete", "history", "program", "yes", 
and "no", as well as 20 names of commonly-called coworkers, friends, or 
family members. Once training is complete, the user can initiate calls in the 

40 recognition phase by speaking the trained keywords. For example, if the 
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name "John" was one of the trained names, the user can initiate a call to 
John by saying the phrase "Call John." The speech recognition system 
recognizes the words "Call" and "John", and dials the number that the user 
had previously entered as John's telephone number. 
5 A block diagram of a training unit 6 of a speaker-dependent speech 

recognition system is shown in FIG. 1. Training unit 6 receives as input 
s(n), a set of digitized speech samples for the word or phrase to be trained. 
The speech signal s(n) is passed through parameter determination block 7, 
which produces a template of N parameters { p(n) n=l...N } capturing the 

10 characteristics of the user's pronunciation of the particular word or phrase. 
Parameter determination unit 7 may implement any of a number of speech 
parameter determination techniques, many of which are well-known in the 
art. An exemplary embodiment of a parameter determination technique is 
the vocoder encoder described in U.S. Pat. No. 5,414,796, entitled 

15 "VARIABLE RATE VOCODER," which is assigned to the assignee of the 
present invention and incorporated by reference herein. An alternative 
embodiment of a parameter determination technique is a fast fourier 
transform (FFT), where the N parameters are the N FFT coefficients. Other 
embodiments derive parameters based on the FFT coefficients. Each spoken 

20 word or phrase produces one template of N parameters that is stored in 
template database 8. After training is completed over M vocabulary words, 
template database 8 contains M templates, each containing N parameters. 
Template database 8 is stored into some type of non-volatile memory so that 
the templates stay resident when the power is turned off. 

25 FIG. 2 is a block diagram of speech recognition unit 10, which operates 

during the recognition phase of a speaker-dependent speech recognition 
system. Speech recognition unit 10 comprises template database 14, which 
in general will be template database 8 from training unit 6. The input to 
speech recognition unit 10 is digitized input speech x(n), which is the speech 

30 to be recognized. The input speech x(n) is passed into parameter 
determination block 12, which performs the same parameter determination 
technique as parameter determination block 7 of training unit 6. Parameter 
determination block 12 produces a recognition template of N parameters 
{t(n) n=l...N} that models the characteristics of input speech x(n). 
35 Recognition template t(n) is then passed to pattern comparison block 16 that 
performs a pattern comparison between template t(n) and all the templates 
stored in template database 14. The distances between template t(n) and 
each of the templates in template database 14 are forwarded to decision block 
18, which selects from template database 14 the template that most closely 
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matches recognition template t(n). The output of decision block 18 is the 
decision as to which word in the vocabulary was spoken. 

Recognition accuracy is a measure of how well a recognition system 
correctly recognizes spoken words or phrases in the vocabulary. For 
5 example, a recognition accuracy of 95% indicates that the recognition unit 
correctly recognizes words in the vocabulary 95 times out of 100. In a 
traditional speech recognition system, the recognition accuracy is severely 
degraded in the presence of noise. The main reason for the loss of accuracy 
is that the training phase typically occurs in a quiet environment but the 

10 recognition typically occurs in a noisy environment. For example, a 
handsfree carkit speech recognition system is usually trained while the car is 
sitting in a garage or parked in the driveway, so the engine and air 
conditioning are not running and the windows are usually rolled up. 
However, recognition is normally used while the car is moving, so the 

15 engine is running, there is road and wind noise present, the windows may 
be down, etc. As a result of the disparity in noise level between the training 
and recognition phases, the recognition template does not form a good 
match with any of the templates obtained during training. This increases 
the likelihood of a recognition error or failure. 

20 FIG. 3 illustrates a speech recognition unit 20 which must perform 

speech recognition in the presence of noise. As shown in FIG. 3, summer 22 
adds speech signal x(n) with noise signal w(n) to produce noise-corrupted 
speech signal r(n). It should be understood that summer 22 is not a physical 
element of the system, but is an artifact of a noisy environment. The noise- 

25 corrupted speech signal r(n) is input to parameter determination block 24, 
which produces noise-corrupted template tl(n). Pattern comparison block 
28 compares template tl(n) with all the templates in template database 26, 
which was constructed in a quiet environment. Since noise-corrupted 
template tl(n) does not exactly match any of the training templates, there is 

30 a high probability that the decision produced by decision block 30 may be a 
recognition error or failure. 

SUMMARY OF THE INVENTION 

35 The present invention is a system and method for the automatic 

recognition of spoken words or phrases in the presence of noise. Speaker- 
dependent speech recognition systems operate in two phases: a training 
phase and a recognition phase. In the training phase of a traditional speech 
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recognition system, a user is prompted to speak all the words or phrases in a 
specified vocabulary. The digitized speech samples for each word or phrase 
are processed to produce a template of parameters characterizing the spoken 
words. The output of the training phase is a library of such templates. In 
the recognition phase, the user speaks a particular word or phrase to initiate 
a desired action. The spoken word or phrase is digitized and processed to 
produce a template, which is compared with all the templates produced 
during training. The closest match determines the action that will be 
performed. The main impairment limiting the accuracy of speech 
recognition systems is the presence of noise. The addition of noise during 
recognition severely degrades recognition accuracy, because this noise was 
not present during training when the template database was produced. The 
invention recognizes the need to account for the particular noise conditions 
that are present at the time of recognition to improve recognition accuracy. 
15 Instead of storing templates of parameters, the improved speech 

processing system and method stores the digitized speech samples for each 
spoken word or phrase in the training phase. The training phase output is 
therefore a digitized speech database. In the recognition phase, the noise 
characteristics in the audio environment are continually monitored. When 
the user speaks a word or phrase to initiate recognition, a noise- 
compensated template database is constructed by adding a noise signal to 
each of the signals in the speech database and performing parameter 
determination on each of the speech plus noise signals. One embodiment of 
this added noise signal is an artificially-synthesized noise signal with 
25 characteristics similar to that of the actual noise. An alternative 
embodiment is a recording of the time window of noise that occurred just 
before the user spoke the word or phrase to initiate recognition. Since the 
template database is constructed using the same type of noise that is present 
in the spoken word or phrase to be recognized, the speech recognition unit 
can find a good match between templates, improving the recognition 
accuracy. 



20 



30 



BRIEF DESCRIPTION OF THE DRAWINGS 

The features, objects, and advantages of the present invention will 
become more apparent from the detailed description set forth below when 
taken in conjunction with the drawings in which like reference characters 
identify correspondingly throughout and wherein: 
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FIG. 1 is a block diagram of a training unit of a speech recognition 
system; 

FIG. 2 is a block diagram of a speech recognition unit; 

FIG. 3 is a block diagram of a speech recognition unit which performs 
5 speech recognition on a speech input corrupted by noise; 

FIG. 4 is a block diagram of an improved training unit of a speech 
recognition system; and 

FIG. 5 is a block diagram of an exemplary improved speech 
recognition unit. 

10 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 



This invention provides a system and method for improving speech 

15 recognition accuracy when noise is present. It takes advantage of the recent 
advances in computation power and memory integration and modifies the 
training and recognition phases to account for the presence of noise during 
recognition. The function of a speech recognition unit is to find the closest 
match to a recognition template that is computed on noise-corrupted 

20 speech. Since the characteristics of the noise may vary with time and 
location, the invention recognizes that the best time to construct the 
template database is during the recognition phase. 

FIG. 4 shows a block diagram of an improved training unit 40 of a 
speech recognition system. As opposed to the traditional training method 

25 shown in FIG. 1, training unit 40 is modified to eliminate the parameter 
determination step. Instead of storing templates of parameters, digitized 
speech samples of the actual words and phrases are stored. Thus, training 
unit 40 receives as input speech samples s(n), and stores digitized speech 
samples s(n) in speech database 42. After training, speech database 42 

30 contains M speech signals, where M is the number of words in the 
vocabulary. Whereas the previous system and method of performing 
parameter determination loses information about the speech characteristics 
by only storing speech parameters, this system and method may preserve all 
the speech information for use in the recognition phase. 

35 FIG. 5 shows a block diagram of an improved speech recognition unit 

50 for use in conjunction with training unit 40. The input to speech 
recognition unit 50 is noise corrupted speech signal r(n). Noise-corrupted 
speech signal r(n) is generated by summer 52 adding speech signal x(n) with 
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noise signal w(n). As before, summer 52 is not a physical element of the 
system, but is an artifact of a noisy environment. 

Speech recognition unit 50 comprises speech database 60, which 
contains the digitized speech samples that were recorded during the training 
phase. Speech recognition unit 50 also comprises parameter determination 
block 54, through which noise-corrupted speech signal r(n) is passed to 
produce noise-corrupted template tl(n). As in a traditional voice 
recognition system, parameter determination block 54 may implement any 
of a number speech parameter determination techniques. 

An exemplary parameter determination technique uses linear 
predictive coding (LPC) analysis techniques. LPC analysis techniques model 
the vocal tract as a digital filter. Using LPC analysis, LPC cepstral coefficients 
c(m) may be computed to be the parameters for representing the speech 
signal. The coefficients c(m) are computed using the following steps. First, 
15 the noise-corrupted speech signal r(n) is windowed over a frame of speech 
samples by applying a window function v(n): 



10 



20 



25 



y(n) = r(n)v(n) 0 <= n <= N-l (1) 

In the exemplary embodiment, the window function v(n) is a hamming 
window and the frame size N is equal to 160. Next, the autocorrelation 
coefficients are computed on the windowed samples using the equation: 

N-k 

Rflo= 2>( m )y( m+k ) k=i,2 p (2) 

m=0 



In the exemplary embodiment, P, the number of autocorrelation 
coefficients to be computed, is equal to the order of the LPC predictor, which 
is 10. The LPC coefficients are then computed directly from the 
autocorrelation values using Durbin's recursion algorithm. The algorithm 
30 may be stated as follows: 

1. E(0) = R(0), i = 1 (3) 
35 2. ki = \ R(i) - Xa^Ra- 



, — i) 



> /E(i-l) (4) 



3. = ki (5) 
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7 

4. of = of 11 - kiC^" 1 <= j <= i-1 (6) 

5. EW = (1-ki 2 ) E(i-D (7) 

6. If i<P then goto (2) with i=i+l. (8) 

7. The final solution for the LPC coefficients is given as 



a 



j = «j 1<= j <= P ( 9 ) 

The LPC coefficients are then converted to LPC cepstral coefficients using the 
following equations: 

c(0) = In (R(0)) (10) 

c(m) = a m + X — fk a m-k 1 <= m <= P (11) 
m ~V k ^ 

c(m)= X — Fk a m-k m>P (12) 
k=l vmy 



It should be understood that other techniques may be used for parameter 
determination instead of the LPC cepstral coefficients. 

In addition, the signal r(n) is passed to speech detection block 56 
which determines the presence or absence of speech. Speech detection block 
56 may determine the presence or absence of speech using any of a number 
of techniques. One such method is disclosed in the aforementioned U.S. 
Patent No. 5,414,796, entitled "VARIABLE RATE VOCODER." This 
25 technique analyzes the level of speech activity to make the determination 
regarding the presence or absence of speech. The level of speech activity is 
based on the energy of the signal in comparison with the background noise 
energy estimate. First, the energy E(n) is computed for each frame, which in 
a preferred embodiment is composed of 160 samples. The background noise 
energy estimate B(n) may then calculated using the equation: 

B(n) = min [ E(n), 5059644, max (1.00547 * B(n-l), B(n-l) + 1)]. (13) 
If B(n) < 160000, three thresholds are computed using B(n) as follows: 



Tl(B(n)) = -(5.544613 x 10* 6 ) * B 2 (n) + 4.047152 * B(n) + 362 (14) 
T2(B(n)) = -(1.529733 x 10' 5 ) * B 2 (n) + 8.750045 * B(n) + 1136 (15) 
T3(B(n)) = -(3.957050 x 10" 5 ) * B 2 (n) + 18.89962 * B(n) + 3347 (16) 
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If B(n) > 160000, the three thresholds are computed as: 

Tl(B(n)) = -(9.043945 x 10* 8 ) * B 2 (n) + 3.535748 * B(n) - 62071 (17) 
T2(B(n)) = -(1.986007 x 10" 7 ) * B 2 (n) + 4.941658 * B(n) + 223951 (18) 
5 T3(B(n)) = -(4.838477 x 10* 7 ) * B 2 (n) + 8.630020 * B(n) + 645864 (19) 

This speech detection method indicates the presence of speech when energy 
E(n) is greater than threshold T2(B(n)), and indicates the absence of speech 
when energy E(n) is less than threshold T2(B(n)). In an alternative 
10 embodiment, this method can be extended to compute background noise 
energy estimates and thresholds in two or more frequency bands. 
Additionally, it should be understood that the values provided in Equations 
(13)-(19) are experimentally determined, and may be modified depending on 
the circumstances. 

15 When speech detection block 56 determines that speech is absent, it 

sends a control signal that enables noise analysis, modeling, and synthesis 
block 58. It should be noted that in the absence of speech, the received signal 
r(n) is the same as the noise signal w(n). 

When noise analysis, modeling, and synthesis block 58 is enabled, it 

20 analyzes the characteristics of noise signal r(n), models it, and synthesizes a 
noise signal wl(n) that has similar characteristics to the actual noise w(n). 
An exemplary embodiment for performing noise analysis, modeling, and 
synthesis is disclosed in U.S. Pat. No. 5,646,991, entitled "NOISE 
REPLACEMENT SYSTEM AND METHOD IN AN ECHO CANCELLER/' 

25 which is assigned to the assignee of the present invention and incorporated 
by reference herein. This method performs noise analysis by passing the 
noise signal r(n) through a prediction error filter given by: 

P 

A ( z ) = 1 " X a i 2 " ! (20) 
i=l 

30 

where P, the order of the predictor, is 5 in the exemplary embodiment. The 
LPC coefficients a { , are computed as explained earlier using equations (1) 
through (9). Once the LPC coefficients are obtained, synthesized noise 
samples can be generated with the same spectral characteristics by passing 
35 white noise through the noise synthesis filter given by: 



4 »> 
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A(Z) 



i=l 



(21) 



which is just the inverse of the filter used for noise analysis. After applying 
a scaling factor to each of the synthesized noise samples to make the 
5 synthesized noise energy equal to the actual noise energy, the output is the 
synthesized noise wl(n). 

The synthesized noise wl(n) is added to each set of digitized speech 
samples in speech database 60 by summer 62 to produce sets of synthesized 
noise corrupted speech samples. Then, each set of synthesized noise 

10 corrupted speech samples is passed through parameter determination block 
64, which generates a set of parameters for each set of synthesized noise 
corrupted speech samples using the same parameter determination 
technique as that used in parameter determination block 54. Parameter 
determination block 54 produces a template of parameters for each set of 

15 speech samples, and the templates are stored in noise-compensated template 
database 66. Noise-compensated template database 66 is a set of templates 
that is constructed as if traditional training had taken place in the same type 
of noise that is present during recognition. Note that there are many 
possible methods for producing estimated noise wl(n) in addition to the 

20 method disclosed in U.S. Pat. No. 5,646,991. An alternative embodiment is 
to simply record a time window of the actual noise present when the user is 
silent and use this noise signal as the estimated noise wl(n). The time 
window of noise recorded right before the word or phrase to be recognized is 
spoken is an exemplary embodiment of this method. Still another method 

25 is to average various windows of noise obtained over a specified period. 

Referring still to FIG. 5, pattern comparison block 68 compares the 
noise corrupted template tl(n) with all the templates in noise compensated 
template database 66. Since the noise effects are included in the templates of 
noise compensated template database 66, decision block 70 is able to find a 

30 good match for tl(n). By accounting for the effects of noise in this manner, 
the accuracy of the speech recognition system is improved. 

The previous description of the preferred embodiments is provided 
to enable any person skilled in the art to make or use the present invention. 
The various modifications to these embodiments will be readily apparent to 

35 those skilled in the art, and the generic principles defined herein may be 
applied to other embodiments without the use of the inventive faculty. 
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Thus, the present invention is not intended to be limited to the 
embodiments shown herein but is to be accorded the widest scope consistent 
with the principles and novel features disclosed herein. 



5 WE CLAIM: 
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CLAIMS 

1. A speech recognition system, comprising: 

2 a training unit for receiving signals of words or phrases to be trained, 

generating digitized samples for each said words or phrases, and storing said 

4 digitized samples in a speech database; and 

a speech recognition unit for receiving a noise corrupted input signal 

6 to be recognized, generating a noise compensated template database by 
applying the effects of noise to said digitized samples of said speech database, 

8 and providing a speech recognition outcome for said noise corrupted input 
signal based on said noise compensated template database. 

2. The speech recognition system of claim 1 wherein said speech 
2 recognition unit comprises: 

a first parameter determination unit for receiving said noise 
4 corrupted input signal and generating a template of parameters 
representative of said input signal in accordance with a predetermined 
6 parameter determination technique; 

a second parameter determination unit for receiving said speech 
8 database with the effects of noise applied to said digitized samples, and 
generating said noise compensated template database in accordance with 
10 said predetermined parameter determination technique; and 

a pattern comparison unit for comparing said template of parameters 
12 representative of said input signal with the templates of said noise 
compensated template database to determine the best match and thereby 
14 identify said speech recognition outcome. 

3. The speech recognition system of claim 1 wherein said speech 
2 recognition unit comprises: 

a speech detection unit for receiving said noise corrupted input signal 
4 and determining whether speech is present in said input signal, wherein 
said input signal is designated a noise signal when speech is determined not 
6 to be present in said input signal; and 

a noise unit activated upon determining that speech is not present in 
8 said input signal, said noise unit for analyzing said noise signal and 
synthesizing a synthesized noise signal having characteristics of said noise 
10 signal said synthesized noise signal for applying the effects of noise to said 
digitized samples of said speech database. 
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4. The speech recognition system of claim 2 wherein said speech 
2 recognition unit further comprises: 

a speech detection unit for receiving said noise corrupted input signal 
4 and determining whether speech is present in said input signal, wherein 
said input signal is designated a noise signal when speech is determined not 
6 to be present in said input signal; and 

a noise unit activated upon determining that speech is not present in 
8 said input signal, said noise unit for analyzing said noise signal and 
synthesizing a synthesized noise signal having characteristics of said noise 
10 signal, said synthesized noise signal for applying the effects of noise to said 
digitized samples of said speech database. 

5. The speech recognition system of claim 2, wherein said 
2 parameter determination technique is a linear predictive coding (IPC) 

analysis technique. 

6. The speech recognition system of claim 4, wherein said 
2 parameter determination technique is a linear predictive coding (IPC) 

analysis technique. 

7. The speech recognition system of claim 3, wherein said speech 
2 detection unit determines the presence of speech by analyzing the level of 

speech activity in said input signal. 

8. The speech recognition system of claim 4, wherein said speech 
2 detection unit determines the presence of speech by analyzing the level of 

speech activity in said input signal. 

9. The speech recognition system of claim 3, wherein said noise 
2 unit analyzes and synthesizes said synthesized noise signal using a linear 

predictive coding (LPC) technique. 

10. The speech recognition system of claim 3, wherein said 
2 synthesized noise signal corresponds to a window of said noise signal 

recorded right before said input signal to be recognized. 
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11. The speech recognition system of claim 3, wherein said 
2 synthesized noise signal corresponds to an average of various windows of 

said noise signal recorded over a predetermined period of time. 

12. The speech recognition system of claim 4, wherein said noise 
2 unit analyzes and synthesizes said synthesized noise signal using a linear 

predictive coding (LPC) technique. 

13. The speech recognition system of claim 4, wherein said 
2 synthesized noise signal corresponds to a window of said noise signal 

recorded right before said input signal to be recognized. 

14. The speech recognition system of claim 4, wherein said 
2 synthesized noise signal corresponds to an average of various windows of 

said noise signal recorded over a predetermined period of time. 

15. A training unit of a speech recognition system which accounts 
2 for the effects of a noisy environment, comprising: 

means for receiving signals of words or phrases to be trained; 
4 means for generating digitized samples for each said words or 

phrases; and 

6 means for storing said digitized samples in a speech database. 

16. A speech recognition unit of a speech recognition system for 
2 recognizing an input signal, said speech recognition unit accounting for the 

effects of a noisy environment, comprising: 
4 means for storing digitized samples of words or phrases of a 

vocabulary in a speech database; 
6 means for applying the effects of noise to said digitized samples of 

said vocabulary to generate noise corrupted digitized samples of said 
8 vocabulary; 

means for generating a noise compensated template database based on 
10 said noise corrupted digitized samples; and 

means for determining a speech recognition outcome for said input 
12 signal based on said noise compensated template database. 

17. The speech recognition unit of claim 16, further comprising: 

2 first parameter determination means for receiving said input signal 

and generating a template of parameters representative of said input signal 
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4 in accordance with a predetermined parameter determination technique; 
and 

6 second parameter determination means for receiving said noise 

corrupted digitized samples of said vocabulary and generating the templates 
8 of said noise compensated template database in accordance with said 
predetermined parameter determination technique; 
10 wherein said means for determining said speech recognition outcome 

compares said template of parameters representative of said input signal 
12 with the templates of said noise compensated template database to 
determine the best match and thereby identify said speech recognition 
14 outcome. 

18. The speech recognition unit of claim 16 wherein said means 
2 for applying the effects of noise comprises: 

means for determining whether speech is present in said input signal, 
4 wherein said input signal is designated a noise signal when speech is 

determined not to be present in said input signal; and 
6 means for analyzing said noise signal and synthesizing a synthesized 

noise signal, said synthesized noise signal added to said digitized samples of 
8 said vocabulary. 

19. The speech recognition unit of claim 17 wherein said means 
2 for applying the effects of noise comprises: 

means for determining whether speech is present in said input signal, 
4 wherein said input signal is designated a noise signal when speech is 

determined not to be present in said input signal; and 
6 means for analyzing said noise signal and synthesizing a synthesized 

noise signal, said synthesized noise signal added to said digitized samples of 
8 said vocabulary. 

20. A method for speech recognition accounting for the effects of a 
2 noisy environment, comprising the steps of: 

generating digitized samples of each word or phrase trained, each said 
4 word or phrase belonging to a vocabulary; 

storing said digitized samples in a speech database; 
6 receiving an input signal to be recognized; 

applying the effects of noise to said digitized samples of said 
8 vocabulary to generate noise corrupted digitized samples of said vocabulary; 
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generating a noise compensated template database based on said noise 
10 corrupted digitized samples; and 

providing a speech recognition outcome for said noise corrupted 
12 input signal based on said noise compensated template database. 

21. The method of speech recognition of claim 20, further 
2 comprising the steps of: 

generating a template of parameters representative of said input 
4 signal in accordance with a predetermined parameter determination 
technique; and 

6 generating templates for said noise compensated template database in 

accordance with said predetermined parameter determination technique; 
8 wherein said step of providing a speech recognition outcome 

compares said template of parameters representative of said input signal 
10 with said templates of said noise compensated template database to 
determine the best match and thereby identify said speech recognition 
12 outcome. 

22. The method of speech recognition of claim 20 wherein said 
2 step of applying the effects of noise comprises the steps of: 

determining whether speech is present in said input signal, wherein 
4 said input signal is designated a noise signal when speech is determined not 

to be present in said input signal; and 
6 analyzing said noise signal and synthesizing a synthesized noise 

signal, said synthesized noise signal added to said digitized samples of said 
8 vocabulary to generate said noise corrupted digitized samples. 

23. The method of speech recognition of claim 21 wherein said 
2 step of applying the effects of noise comprises the steps of: 

determining whether speech is present in said input signal, wherein 
4 said input signal is designated a noise signal when speech is determined not 

to be present in said input signal; and 
6 analyzing said noise signal and synthesizing a synthesized noise 

signal, said synthesized noise signal added to said digitized samples of said 
8 vocabulary to generate said noise corrupted digitized samples. 
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